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Abstract 

Background: Host genetics influence the outcome of HCV disease. HCV is also highly mutable and escapes host 
immunity. HCV genotypes are geographically distributed and HCV subtypes have been shown to have distinct 
repertoires of HLA-restricted viral epitopes which explains the lack of cross protection across genotypes observed in 
some studies. Despite this, immune databases and putative epitope vaccines concentrate almost exclusively on HCV 
genotype 1 class l-epitopes restricted by the HLA-A*02 allele. While both genotype and allele predominate in 
developed countries, we hypothesise that HCV variation and population genetics will affect the efficacy of 
proposed epitope vaccines in South Africa. This in silico study investigates HCV viral variability within well-studied 
epitopes identified in genotype 1 and uses algorithms to predict the immunogenicity of their variants from other 
less studied genotypes and thus rate the most promising vaccine candidates for the South African population. Six 
class I- and seven class II- restricted epitope sequences within the core, NS3, NS4B and NS5B regions were 
compared across the six HCV genotypes using local genotype 5a sequence data together with global data. 
Common HLA alleles in the South African population are A30:01, A02:01, B58:02, B07:02; DRB1*13:01 and 
DRB1*03:01. Epitope binding to 13 class I- and 8 class -II alleles were described using web-based prediction servers, 
Immune Epitope Database, (IEDB) and Propred. Online population coverage tools were used to assess vaccine 
efficacy. 

Results: Despite the homogeneity of genotype 1 and genotype 5 over the epitopes, there was limited promiscuity 
to local HLA-alleles.Host differences will make a putative vaccine less effective in South Africa. Of the 6 
well-characterized class I- epitopes, only 2 class I- epitopes were promiscuous and 3 of the 7 class-ll epitopes were 
better conserved and promiscuous. By fine tuning the putative vaccine using an optimal cocktail of genotype 1 and 
5a epitopes and local HLA data, the coverage was raised from 65.85% to 91.87% in South African Blacks. 

Conclusion: While in vivo and in vitro studies are needed to confirm immunogenic epitopes, in silico HCV epitope 
vaccine design which takes into account HCV variation and host allele frequency will maximize population 
coverage in different ethnic groups. 
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Background 

As a relatively "new" virus, only identified in 1989 [1] 
and first cultured successfully in 2005 [2], there is still 
much that is unknown about the hepatitis C virus 
(HCV) and this has hindered the development of an ef- 
fective vaccine. The following are some of the challenges 
to successful HCV vaccine design. 

1) The virus is highly mutable and exists as a 
quasispecies within the host and genotypes cluster 
geographically. 

2) Host cell responses to HCV infection are poorly 
defined and inconsistent among infected individuals. 
CD4+ and CD8+ T-cell responses are also not cross- 
protective to heterologous genotypes [3] and, to date, 
there is no immunodominant epitope that is 
consistently found in HCV-positive individuals [4], 

3) Humans are the only natural host of HCV, and 
suitable laboratory models have only been developed 
recently. The chimpanzee has been infected in the 
laboratory [5], but studies using this model are 
expensive and limited. The mouse model for viral 
pathogenesis studies promises a more practical and 
plausible alternative [6,7]. 

Epitope-based vaccines promote an immune response 
by presenting immunogenic peptides (viral genotype- 
specific) bound to major histocompatibility (MHC) 
molecules (host specific) to the T cell receptor. Class II- 
proteins are presented to T helper cells by antigen pre- 
senting cells (APCs) with the aid of the CD4 co-receptor 
whereas class I- proteins are presented by the infected 
target cell to cytotoxic T cells with the aid of the CD8 
co-receptor. The T helper response is important in 
directing and activating the immune response, including 
the effectiveness of CD8+ T cells [8] .An effective vaccine 
must be capable of inducing and maintaining powerful 
CD4 and CD8 T-cell immunity in the greatest propor- 
tion of its target population. 

Both HCV genotype and HLA allele frequency are dis- 
tributed geographically. Viral genotype, host genetic 
background [9] and HLA class I- [10] and class II- alleles 
[11] are associated with both HCV disease progression 
and sustained response to therapy [12]. South Africa has 
diverse ethnic groups, hence a high diversity of HLA 
genetic background [13]. Black Africans, including the 
well-studied Zulu ethnic group, constitute the majority 
(79.4%) population in the country (Statistics South 
Africa, [14], http://www.statssa.gov.za/PublicationsHTML/ 
P03022010/html/P03022010.html). Other major popula- 
tion groups include Caucasians (Europeans and Indian/ 
Asian,11.8%) and those of mixed race (8.8%). The pre- 
dominant HCV genotype in South Africa is genotype 5a. 
This little studied genotype accounts for 57% of the 



HCV infections in South Africa with the very well studied 
genotype 1 accounting for 23% [15]. In comparison, geno- 
type 1 accounts for 70% of HCV infections in USA [16]. 
Hence, most peptide-based vaccines studies concen- 
trate mainly on HCV genotype 1 epitopes restricted by 
HLA-A*02 which is the most common HLA allele in 
populations of European/Caucasian descent (New allele 
Frequency Database [17], http://www.allefrequencies.net). 

The binding of the epitope to the HLA-molecule is a 
highly selective process as only 1 in 40-200 peptides 
would bind to the HLA class I- or II- allele with high 
affinity to produce an efficient immune response [18]. 
Computer prediction servers have made it possible to 
identify potentially strong peptide binders to HLA mole- 
cules that can then be tested in vitro and in vivo as 
putative epitopes for peptide-based vaccines. This is a 
cost- and time-saving exercise as it is expensive and 
laborious to synthesize and test several 9-mer or over- 
lapping peptides over long target antigens. There are 
various computational prediction servers available and 
their sensitivity is constantly improving, including more 
than 20 prediction servers to identify HLA-II binding 
peptides [19]. 

We hypothesize that putative vaccines based on restric- 
tion by the HLA-A*02 allele and genotype 1 sequences 
will not perform optimally in South Africa. The aim of 
the study was, therefore, to investigate the heterogeneity 
of well studied HCV epitope sequences across HCV 
genotypes (with particular reference to genotype 5a) 
and assess their immunogenicity against prevalent local 
HLA-types in order to assess vaccine efficacy and popu- 
lation coverage in the ethnically diverse South African 
population. This descriptive study used web-accessible 
prediction servers to predict epitope binding of recently 
published putative epitopes for HCV vaccines against the 
South African HLA background. The main objectives of 
the study were: 

1) To characterise the variation of selected published 
immunogenic epitopes within popular target 
antigens, focusing on South African genotype 5a data. 

2) To predict the immunogenicity of these epitopes and 
their variants against the background of prevalent 
alleles in the South African target population. 

Results 

Degree of conservation between epitopes 

The Weblogo consensus was generated from individual 
alignments of all available sequence data of HCV geno- 
types (la, lb, 2, 3, 4, 5a and 6). Thus, seven web logos 
were generated for each of the 13 chosen class I- (N=6) 
and class II- (N=7) epitopes (Table 1). The epitopes 
chosen for this study are well characterized and refer- 
enced (Table 1). NS4B 2422 ^ has only one reference 



Prabdial-Sing et al. BMC Immunology 2012, 13:67 
http://www.biomedcentral.com/1471-2172/13/67 



Page 3 of 15 



Table 1 Six well studied HLA class I- and seven class II- restricted HCV immunodominant epitope sequences were 
chosen from previous publications for this study 



CLASS 1 EPITOPES 


SEQUENCE (Subtype) 


RESTRICTION 




REFERENCE 


NUMBER OF REFERENCES 
LISTED AT IEDB 


NS3 1073-1081 


CINGVCWTV (1a) 


A02 




[20,21] 


78 


NS3 1406-1415 


KLVALGINAV (1a) 


A02 




[22,23] 


70 


NS4 1807-1816 


LLFNILGGWV (1a) 


A02 




[24,25] 


39 


NS4 1851-1859 


ILAGYGAGV (1) 


A02 




[22] 


29 


NS5B 2422-2433 


MSYSWTGALVTP (1) 


B15 




[22] 


1 


NS5B 2727-2735 


GLQDCTMLV (1) 


A02 




[22] 


22 


CLASS II 












Core 17-35 a 


RRPQDVKFPGGGQIVGGVY (1) 


Undetermined Class 1 


I allele 


[26] 


1 


Core 21-40A 


DVKFPGGGQIVGGVYLLPRR (1) 


HLA-DRB1*1501 




[21,26,27] 


13 


NS3 1248-1261 


G YKVLVLN PSVAAT (1) 


HLA-DRB1*1201; 1101; 1301; 0401 


[21,25,28] 


5 


NS4A 1781-1800 


LPGNPAIASLMAFTAAVTSP (1a) 


Undetermined Class I 


I allele 


[25] 


3 


NS4A 1801-1820 


LTTSQTLLFNILGGWVAAQL (1a) 


Undetermined Class I 


I allele 


[25,27,29] 


4 


NS5 2571-2590 


KGGRKPARLIVFPDLGVRVC (1a) 


Undetermined Class I 


I allele 


[4,25,27,29] 


4 


NS5 2661-2680 


QCCDLDPQARVAIKSLTERL (1a) 


Undetermined Class I 


I allele 


[27,29] 


A 



A Class II- restricted epitopes in the core region are overlapping sequences. 

(others have 22-78 references) but it is also the only one 
that has a different restriction allele i.e. B15. The HCV 
consensus was derived from the 7 generated weblogos 
and the percentage conservation within each genotype 
over the epitope region was calculated as described in 
the Methods (Table 2 and Additional file 1: Figure SI). 

The comparative variability of the epitope sequences 
within and across the different genotypes is shown in 



Table 2. Genotypes 2 and 6 have the lowest mean intra- 
genotype scores for both class I- and II- epitope 
sequences, indicating a greater variation among subtypes 
within these genotypes. There is only one subtype within 
genotype 5 so not surprisingly the epitope sequences, in- 
cluding our sequences, from subtype 5a are relatively 
conserved. Because a large proportion of sequences on 
the database belong to genotype la or lb, the consensus 



Table 2 The sequences of the chosen epitopes were compared to the consensus sequence and conservation scores 
(as percentages) were calculated 



CLASS 1 


Consensus Epitope 






HCV GENOTYPES 






Mean across 


MAX. 


MIN. 


SD 


p-value 


EPITOPE 


sequence 


1 


2 


3 


4 


5 


6 


genotypes 










NS3 1073 1081 


CINGVMWTV 


78 


67 


67 


67 


78 


67 


70.67 


67 


78 


5.680 


0.3062 


^|C^ 14U6 ~ 1415 


LTSLGLNAV 


67 


56 


67 


78 


67 


56 


65.17 


56 


78 


8.280 


0.1645 


^| ^1807-1816 


LLFNILGGW 


100 


78 


78 


100 


100 


78 


89.00 


78 


100 


1 2.049 


0.6513 


NS4 185M859 


ILAGYGAGV 


89 


67 


89 


78 


89 


67 


79.83 


67 


89 


10.815 


0.2231 


NS5B 2422-2433 


MSYSWTGAL 


89 


89 


89 


100 


89 


67 


87.17 


67 


100 


10.815 


0.406 


NS5B 2727-2735 


GLRDCTMLV 


78 


56 


44 


78 


78 


33 


61.17 


33 


78 


1 9.823 


0.4142 




Mean within genotypes 


83.50 


68.83 


72.33 


83.50 


83.50 


61.33 












CLASS II 
EPITOPE 


Consensus Epitope 
sequence 


1 


2 


3 


4 


5 


6 


Mean across 
genotypes 


MAX. 


MIN. 


SD 


p-value 


CORE' 7 - 40 


RRPQDVKFPGGGQIVGGVYLLPRR 


100 


96 


66 


96 


96 


96 


91.67 


67 


78 


5.680 


0.3062 


^jC^ 1248 " 1261 


GYKVLVLN PSVAAT 


100 


93 


93 


100 


100 


93 


96.50 


93 


100 


3.834 


0.32 


NS3 1 781-1800 


LPGNPAVASLMATAAVTSP 


85 


80 


95 


85 


90 


65 


83.33 


65 


95 


10.327 


0.4142 


kic* 18.01 -1820 


LTTSQTLLFN 1 LGGWVASQL 


85 


65 


80 


90 


85 


70 


79.17 


65 


90 


9.703 


0.962 


NS5B 257,-2590 


KGGRKPALIVYPDLGVRVC 


80 


80 


90 


95 


95 


80 


86.67 


80 


95 


7.527 


0.2231 


NS5B 2661 " 2580 


QCCDLEPEARVAIKSLTERL 


85 


55 


70 


80 


60 


50 


66.67 


50 


85 


14.023 


0.4159 




Mean within genotypes 


89.17 


78.17 


82.33 


91.00 


87.67 


75.67 
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sequences that were generated is mostly representative 
of genotype 1 sequences. Mean conservation scores of 
genotype 5 sequences are the same as that of genotype 1 
for class I- (both had an average score of 83.5%) and 
similar for class II- (87.67% versus 89.17%, for genotypes 
5 and 1, respectively for the class II epitopes). The intra- 
genotype variation was not statistically significant for 
any of the epitopes selected. Two class I- epitopes 
(NS4B 1807 ' 1816 and NS5B 2422 ' 2433 ) and four of the six 
class II-epitopes had the highest average conservation 
scores of more than 80% (Table 2). Published class II- 
restricted epitopes were, in general, better conserved 
than the class I- epitopes, both within and across the 
genotypes (Table 2).Some epitopes were well con- 
served (NS4B 1807 1816 and NS5B 2422 ' 2433 ) while others 
(NS5B 2727 " 2735 and NS5B 2661 " 2680 ) were highly variable 
(Table 2). 

Most epitopes were identified using genotype la 
sequences, hence it follows that the epitope sequences 
had greater identity with genotype 1. Genotype 4 epitope 
sequences showed a consistently high degree of corres- 
pondence with the consensus but since this genotype 
was represented by the smallest data set, this may not be 
a true reflection of variation within the genotype. Geno- 
type 6 showed the most variability, with a mean conser- 
vation score of 61.33% within this genotype, which is to 
be expected since this genotype is known to be highly 
variable (Table 2). 



Major HLA alleles 

The most common HLA-A, -B and -C alleles in the 
South African Black population are classified into super- 
types as described by [30]. For example, and as seen in 
Table 3, the A02 supertype includes the A"02:01 and 
A*68:02 alleles. The A*30:01 allele belongs to the super- 
type A01A03. This study predicted binding to 13 HLA 
class I- alleles in 8 supertypes and 8 class II- HLA-DR 
alleles predominant in the South African population. 

Epitope binding prediction 

The predicted binding values of the published and 
"newly predicted" epitopes to prevalent local class I- 
alleles were generated using the IEDB, ANN prediction 
server (Tables 3 and 4, respectively). Predicted binding 
values of the published epitopes to local HLA class II- 
alleles were generated using the prediction server 
Propred, Quantitative matrix (Table 5). 

HLA-A and -B class I- restricted binding 

Binding predictions of epitopes and their variants for all 
available HLA alleles prevalent in the South African 
population are shown in Table 3.Five of the six HLA 
class I-published epitopes (NS3 1073 ' 1081 , NS3 1406 ' 1415 , 



NS4B 1807-1816 NS4B 18 5 1-1859 ^ NS5B 2727-273 5) ^ 

been reported to be HLA-A*02 restricted (Table 1). 
Three of the five published HLA-A*02 restricted epi- 
topes bound the A*02:01 allele as expected (Table 3). 

Predictions for the different alleles were in agreement 
regardless of the programme or algorithm used (IEDB 
ANN, Propred I, SYFPEITHI) with two exceptions, bind- 
ing of the 9 amino acid epitopes of NS4B 1807 ' 1816 
LLFNILGGWV and the HLA-B*27:05 binding predictions. 



The original 10 amino acid NS4B 



genotype 1 epi- 



tope LLFNILGGWV (which is conserved in genotype lb, 
4 and 5a) predicted to bind with high affinity (44.1 
IC 50 nM) to HLA-A*02:01. Neither IEDB ANN nor 
ProPred I predicted binding between this allele and the 
two possible 9 mer epitopes, LLFNILGGW and 
LFNILGGWV while SYFPEITHI predicted binding of 
18% and 14%, respectively. One of the shortcomings of 
IEDB ANN is that it can only predict binding peptides 
that are of the same length as those in the training set. 
For this reason, all peptides were re-analysed with all the 
alleles of interest using the "any length" parameter for 
epitope length. No other changes were observed to bind- 
ing predictions listed in Table 3 using these parameters. 

The second exception observed was the failure of IEDB 
ANN to predict binding between any of the epitopes (or 
their variants) and HLA-B*27:05 which SYFPEITHI and/ 
or ProPred I scored. There was no data supporting restric- 
tion of these particular peptides by B*27:05 in the IEDB 
epitopes database. Both SYFPEITHI and ProPred I use 
peptide motifs and amino acid matrix based prediction. 
The following scores using x- [R (K)]-x (6_9) could explain 
the scoring of these two packages for NS3 1406 " 1415 epitopes 
KLVALGINA, KLSGLGINA (21% ProPredI 7 % SYH>EITHI j re- 
spectively) and variants KLQDCTMLV and KLRDCTLLV 
(32% ProPredI 12% SYFPEITHI , respectively). SYFPEITHI uses 
x-[R]-x (5_8) -[LFYRHK (MI)]. However, one would expect 
lower predictions for NS5B 24 
and MSYTWTGAL (38% Pro 
the carboxyl anchor is present but this was not the case. 

NS3 1073-1081 ; NS4B 1851-18 5 9 ^ ^^2727-2735 bound 

with high affinity to A*02:01 allele, regardless of geno- 
typic variation (Table 3). All variants tested for both 

NS5B 2727-2735 and NS4B 1851-1859 were predicted tQ bind 

the A*02:01 allele with equal strength (<20 IC 50 nM, 
Table 3). High and intermediate binding affinities over 
all variants was also observed for NS3 1073 ' 1081 and 
NS4B 1851 ' 1859 with allele A*68:02 (Table 3), of the A02 
supertype. 

Two of the variants, SISGVLWTV (genotype 2a) and 
TVGGVMWTV (genotype 3a) had changes from the 
wild type N (Asparagine) in position 3 but none of the 
variants had changes in positions 4, 5 and 7. Interest- 
ingly, when all possible alanine exchange peptides were 
placed into IEDB ANN, the output scores reflected the 



epitopes MSYSWTGAL 
since only 



12% 



SYFPEITHI-, 



Table 3 Binding affinity scores of published epitopes and their variants were determined by the IEDB prediction program to relevant supertypes in South 
Africa 



Gene 


Epitope 
sequence 


Genotype 
of epitope 


Supertypes 






Class A 


- Alleles 












Class B- Alleles 






A01 


A02 


A24 


A01A03 


A01A24 




B07 




B58 


B27 


Allele type 


A*01:01 


A*02:01 


A*68:02 


A*23:01 


A*30:01 


A*29:02 


B*07:02 


B*35:01 


B*53:01 


B*57:01 


B*58:01 


B*15:03 


B*27:05 


NS3 (A*02) 


CINGVCWTV 


1a 




17802 


67 


61 


14908 


15501 


12611 


23637 


20927 


25523 


19827 


13679 


19257 


23485 


1073-1081 


CVNGVCWTV 


lb 




16997 


110 


20 


12228 


13122 


11766 


21885 


15696 


13382 


18288 


12132 


20367 


23007 




SISGVLWW 


2a variant 




18961 


11 


16 


21483 


11417 


11417 


22455 


22186 


29702 


18590 


15055 


15691 


20667 




1A/GGVMWTV 


3a 




19940 


64 


8 


12677 


14750 


9776 


20729 


21877 


24623 


16182 


18054 


26500 


24303 




AVNGVMWTV 


4a variant 




17734 


23 


14 


24001 


4015* 


12036 


10753 


20258 


20595 


17093 


12996 


13641 


18882 




CINGVLWTV 


5a 




15172 


26 


39 


17548 


13613 


13865 


23524 


21854 


15854 


18628 


11203 


17516 


21090 




CINGVMWTL 


5a variant 




17922 


140 


101 


10449 


14413 


11435 


18947 


13165 


11237 


2239 


13165 


13572 


19956 


NS3 (A*02) 


KLVALGINA 


la 




22719 


273 


15048 


32261 


1830 


18800 


24242 


25216 


37253 


23529 


20557 


4839 


19019 


1406-1415 


KLSGLGLNA 


lb 




19133 


475 


21824 


33559 


2557 


13152 


20740 


27147 


37083 


23891 


19220 


8973 


18099 




QLTSLGLNA 


4a 




20013 


7051 


15292 


33674 


12859 


12517 


26454 


24440 


37244 


22168 


26218 


7165 


19904 




KLVALGINAV 


la 




37929 


52 


8564 


39134 


NO VALUE 


31977 


19547 


42247 


34339 


NO VALUE 


NO VALUE 


NO VALUE 


26021 




LTGLGINAV 


5a 




12100 


5692 


304 


32426 


10980 


20519 


21309 


20981 


33652 


25012 


21599 


12577 


26332 




QLTGLGINA 


5a variant 




22408 


6972 


7419 


34672 


13389 


17488 


26117 


23541 


36968 


25569 


22283 


15466 


20054 


NS4B (A*02) 


LLFNILGGW 


1 a, 1 b, 4, 5a 




22942 


14359 


17095 


18086 


17906 


9175 


24903 


19854 


17154 


956* 


962* 


5918 


23118 


1807-1816 


MFFNILGGVW 


3a 




24613 


23482 


19706 


343 


15640 


1707* 


21757 


11817 


8151 


10769 


1251 


13832 


26621 




LLFNILGGWV 


1 a, 1 b, 4, 5a 




32231 


44 


1159* 


38969 


NO VALUE 


19453 


32445 


40287 


25767 


NO VALUE 


NO VALUE 


NO VALUE 


25868 


NS4B (A*02) 


ILAGYGAGV 


1 a, 1 b, 5a 




20500 


15 


530* 


30882 


15492 


10120 


11883 


21134 


37213 


22934 


20702 


3735 


20143 


1851-1859 


ILAGYGTGV 


5a variant 




20351 


18 


193 


32028 


17493 


12563 


11272 


21994 


36657 


23555 


20603 


2196 


19849 


NS5B (B*15) 


MSYSWTGAL 


1 a, 1 b, 4 




12612 


1522 


24 


2924 


2372 


5457 


1530* 


50 


8456 


10166 


523* 


80 


16876 


2422-2433 


MSYTWTGAL 


5a 




12133 


2640 


22 


8602 


2141* 


7606 


2515* 


58 


9150 


10680 


787* 


144 


17267 




YTWTGALIT 


5a variant 




15779 


3000 


13286 


33166 


13737 


1561 


18979 


3920 


27619 


22480 


17360 


6553 


18765 


NS5B (A*02) 


GLQDCTMLV 


la 




18371 


8 


5733 


11972 


13187 


6275 


20996 


27015 


35681 


25282 


22002 


10687 


17601 


2727-2735 


KLQDCTMLV 


lb 




17735 


7 


3878 


6160 


2071* 


9527 


17308 


26776 


35038 


23310 


18296 


3587 


16634 




KLRDCTLLV 


5a 




19744 


13 


14912 


15150 


10 


5150 


2800 


27145 


36627 


21481 


20362 


1720* 


18071 




ALRDCTMLV 


4a 




19976 


19 


4673 


19836 


29 


9982 


5384 


26302 


36740 


24190 


22343 


1206* 


20027 



<50 IC 50 nm, bold, high affinity. 

>50 IC 50 nm, <500 IC 50 nm, italic, intermediate affinity. 

>500 IC 50 nm, #, poor affinity. 

No value indicates server produced no binding score. 



Table 4 Binding affinity scores of "newly predicted" epitopes and their variants were determined by the IEDB prediction program to relevant supertypes in 
South Africa 



GENE EPITOPE 

bbCJUbNLb 


GENOTYPE 
Ur brl 1 Urb 


Supertypes 






Class A- 


Alleles 










Class B- Alleles 






A01 


A02 


A24 


A01A03 


A01A24 




B07 




B58 




B27 


Allele type 


A*01:01 


A*02:01 


A*68:02 


A*23:01 


A*30:01 


A*29:02 


B*07:02 


B*35:01 


B*53:01 


B*57:01 


B*58:01 


B*15:03 B*27:05 


NS3 LTGPTPLLY 


5a, 1b 




15 


23679 


24474 


24873 


4551 


5 


24599 


6188 


7688 


448 


28 


1 558 


22842 


LHGPTPLLY 


1a 




10396 


24884 


27469 


21381 


17350 


10 


26731 


12561 


6443 


21 1 75 


9987 


442 


23420 


FLSTATQTF 


5a 




165 


15329 


1 7845 


3634 


1663 


1886 


15839 


40 


17977 


16320 


4231 


8 


1 8662 


IVSTAAQTF 


1a 




20409 


23323 


22013 


4758 


11756 


5496 


11246 


75 


13372 


814* 


425 


55 


22273 


VLSTvTQSF 


1b, 2a 




18550 


13712 


1 7004 


4838 


16940 


5785 


15666 


988* 


29052 


1 1492 


1654 


26 


21 745 


IVSTDTQSF 


4a 




19885 


22289 


20020 


12440 


14943 


5300 


6080 


151 


8229 


4394 


973* 


47 


24757 


TLAGPKGPV 


5 a, 6a 




23444 


2081* 


13 


33957 


16907 


18949 


6657 


21854 


39095 


25027 


22108 


20499 


22034 


TLASPRGPV 


1b 




22044 


1451* 


8 


32375 


13790 


18855 


2453 


21660 


39379 


25346 


22237 


6235 


22501 


TLASSRGPV 


2a 




22034 


857* 


1 1 


29481 


9965 


20464 


2095 


19022 


38571 


2451 1 


22225 


3353 


22267 


TLASAKHPA 


3a 




21914 


413 


49 


29038 


10681 


19694 


16284 


12935 


39038 


24637 


21967 


1 301 5 


23364 


TIASPKGPV 


la 




22885 


7397 


7 


34010 


15054 


20663 


7437 


19533 


38620 


25493 


22070 


1 6303 


241 45 


SVIDCNSAV 


5a 




21948 


30 


9 


24435 


8789 


12923 


1702 


4571 


35486 


21627 


21514 


3381 


25586 


SVTDCNTCV 


1b 




21476 


131 


24 


30991 


15202 


21169 


19345 


15846 


19349 


26045 


22021 


1 1 521 


22609 


SVIDCNVAV 


1 b, 2a, 6a 




21855 


15 


6 


22019 


7833 


13308 


3218 


4376 


31399 


2431 7 


21463 


341 2 


24232 


SVIDCNTCV 


la 




22281 


25 


1 3 


23478 


14812 


17452 


17390 


13879 


20769 


25334 


21 666 


7032 


23918 


SVIDCNTSV 


4a 




22543 


18 


9 


25166 


10636 


15522 


3402 


11164 


1 7097 


24124 


20942 


4646 


24512 


ITYSTYGKF 


1b, 5a, 2a, 2b, 1a, 4a 




16829 


22979 


16133 


124 


9722 


352 


21954 


6132 


16141 


354 


43 


27 


20982 


LTYSTYGKF 


3a 




14296 


22834 


13829 


263 


10036 


379 


22076 


3345 


11660 


860* 


41 


31 


20046 


KVLVLNPSV 


1a, 1b, 2a, 2b, 4a, 5a, 6a 




23587 


50 


6303 


27046 


21 


18669 


14670 


21450 


31145 


20648 


8842 


3129* 


18558 


RAKAPPPSW 


5a, 1 b, 2a, 6a 




25817 


25080 


27568 


8387 


308 


25791 


7172 


18126 


8580 


31 


11 


596* 


22382 


RAQAPPPSW 


1 b, 3a, 1 a 




24980 


24747 


27454 


22992 


6443 


24136 


6017 


14212 


3253 


38 


8 


1675* 


17482 


KVWLAPPPSW 


4a 




24000 


4927 


22172 


26746 


170 


16220 


18770 


9620 


39029 


21580 


20215 


12296 


20633 


LTSLGVNAV 


5a 




5815 


3795 


42 


33629 


6533 


20008 


16663 


13886 


27357 


24243 


18277 


3860* 


24767 


LTSLGLNAV 


5a variant 




5305 


3082 


64 


32917 


6065 


18186 


16615 


16431 


29952 


24579 


19519 


7004 


23118 



<50 IC 50 nm, bold, high affinity. 

>50 IC 50 nm, <500 IC 50 nm, italic, intermediate affinity. 

>500 IC 50 nm, #, poor affinity. 
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Table 5 Binding affinity scores (as percentages) of Class II published epitopes and their variants were determined by 
the ProPred prediction program to common DRB1* alleles prevalent in the South African population 



Epitope: 


Sequence 


HCV Genotype 
specificity 


DRB1*0101 


*0102 


*0301 


*0401 


*0701 


*1101 


*1301 


*1501 


Core 17 " 42 


RRPQDVKFPGGGQIVGGVYLLPRRGP 


1,2,5&3 var &6 var 




















VYLLPRRGP 


1,2, 4, 5,6 


0.0% 


0.0% 


1 8.0% 


0.0% 


0.0% 


1 6.0% 


48.0% 


1 8.0% 




VGGVYLLPR 


1,2, 4, 5,6 


0.0% 


0.0% 


1 7.0% 


0.0% 


9.0% 


9.0% 


1 0.0% 


20.0% 


NS3 1248 - 1261 


GYKVLVLNPSVAAT 


1, 2, 4, 5, 6 




















LVLNPSVAA 


1, 2, 3, 4, 5, 6 


37.0% 


54.0% 


36.0% 


47.0% 


28.0% 


1 7.0% 


34.0% 


39.0% 




YKVLVLNPS 


1,2, 4, 5,6 


5.0% 


0.0% 


0.0% 


30.0% 


9.0% 


31.0% 


27.0% 


1 7.0% 


NS4B 1781 - 1800 


LPGNPAIASLMAFTAAVTSP 


1a,4 var 




















LPGNPAVAS 


2,3, 5, 6 


0.0% 


2.0% 


0.0% 


4.0% 


0.0% 


0.0% 


9.0% 


0.0% 




LPGNPAIAS 


1,4 


0.0% 


0.7% 


1 5.0% 


4.0% 


0.0% 


2.4% 


0.0% 


7.0% 




IASLMAFTA 


1 


7.0% 


23.0% 


0.0% 


0.0% 


4.0% 


0.0% 


14.0% 


21.0% 


NS4B 1801 - 1820 


LTTSQTLLFNILGGWVAAQL 


1a, 1b var , 




















LFNILGGVW 


1, 4, 5 


0.0% 


0.0% 


1 6.0% 


0.0% 


24.0% 


0.0% 


16.0% 


28.0% 




FNILGGWVA 


1, 4, 5 


47.0% 


47.0% 


0.0% 


2.0% 


1 6.0% 


28.0% 


16.0% 


31.0% 




ILGGWVASQ 


4,5 


0.0% 


0.0% 


28.0% 


0.0% 


0.0% 


2.4% 


8.0% 


0.0% 




LGGWVASQI 


4,5 


0.0% 


0.0% 


0.0% 


0.0% 


21.0% 


0.0% 


1 3.0% 


21.0% 


NS5B 2571 - 259 ° 


KGGRKPARLIVFPDLGVRVC 


1, 2 var & 6 var 




















VFPDLGVRV 


1 


0.0% 


0.0% 


34.0% 


0.0% 


0.0% 


0.0% 


0.0% 


0.0% 




VYPDLGVRV 


3,5 


0.0% 


0.0% 


35.0% 


0.0% 


14.0% 


0.0% 


0.0% 


1 9.0% 




IVYPDLGVR 


3,5 


0.0% 


0.0% 


28.0% 


0.0% 


0.0% 


0.0% 


7.0% 


0.0% 




LIVYPDLGV 


3,5 


0.0% 


0.0% 


0.0% 


0.0% 


1 2.0% 


0.0% 


3.0% 


60.0% 


NS5B 2661 - 2680 


QCCDLDPQARVAIKSLTERL 


5 var 




















LAPEARQAI 


lb 


0.0% 


0.0% 


8.0% 


0.0% 


1 1 .0% 


0.0% 


4.5% 


1 1 .0% 




LDPQARVAI 


5 


0.0% 


0.0% 


8.0% 


0.0% 


0.0% 


0.0% 


0.0% 


0.0% 




LQPEARAAI 


^var 


0.0% 


0.0% 


22.0% 


0.0% 


1 2.0% 


1 .0% 


22.0% 


26.0% 



experimental binding changes for all of the alanine ex- 
change peptides with the exception of the total abroga- 
tion of signal for substitutions in positions 3, 4 and 5 
(data not shown). Of note, while consistent binding was 
observed across the supertype A02 for all of the variants 
of the A*02 restricted epitope NS3 1073 " 1081 , epitopes of 
genotypes 1, 3a and 5a (variant) were found to be inter- 
mediate binders (Table 3). 

The genotype 4a and 5a variants of the HLA-A*02 
restricted epitope NS5B 2727 2735 displayed some level of 
promiscuity as these were predicted to bind with high 
affinity to the A01A03 supertype allele, A*30:01 (29 and 
10 IC 50 nM, respectively), while the genotype lb variant 
had low affinity with this allele (2071 IC 50 nM) and the 
original genotype la peptide was not predicted to bind 
at all. The original peptide and one of the two of three 
variants of the published B*15-restricted NS5B 2422 2433 
epitope displayed intermediate binding IC 50 nM values 
of 80 and 144 (Table 3). This epitope showed the highest 
cross-reactivity across the supertypes with both the 



original epitope and one of the genotype 5a variants 
binding very strongly to A s 68:02 (supertype A02) and 
B*35:01 (B7 supertype; Table 3). 

Of the 6 class I- epitopes used in this study, only two 
epitope variants were found to be promiscuous: 
MSYTWTGAL (supertypes A02, B07, B27) and 
KLRDCTLLV (A02, A01A03).In a preliminary attempt 
to identify conserved epitopes showing greater promis- 
cuity across supertypes, strings of epitopes (other than 
the ones selected from publications for this study) of the 
NS3 protein were placed into the IEDB server. Table 4 
indicates that five of the eight epitopes were predicted 
to be promiscuous, binding with high (<50 IC 50 nm) 
and intermediate (<500 IC 50 nm) affinities to two or 
more supertypes: LTGPTPLLY (A01, A01A24, B58), 
FLSTATQTF (A01, B07, B58, B27), ITYSTYGKF (A24, 
A01A24, B58, B27), KVLVLNPSV (A02, A01A03), 
RAKAPPPSW (A01A03, B58). Of the five epitopes above, 
three were conserved among genotypes 1, 2, 4 and 5 
(Table 4), ITYSTYGKF, KVLVLNPSV and RAKAPPPSW. 
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Class II- alleles 

ProPred II was used to predict binding of the longer 
class II- epitopes. Before calculating the predicted bind- 
ing, the programme identifies all overlapping nine amino 
acid peptides within the input polypeptide. A predicted 
binding score is given as a percentage of the max- 
imum possible binding (i.e. the highest log value achiev- 
able by an optimal peptide) with the chosen allele 
(Table 5). For example, CORE 17 " 42 , RRPQDVKFP- 
GGGQIVGGVYLLPRRGP, returned two 9-mer peptides, 
VYLLPRRGP and VGGVYLLPR, which scored simi- 
larly for alleles HLA-DRB1*03:01 and HLA-DRB1*15:01 
(Table 5). However, in the context of DRB1*13:01, 
VYLLPRRGP had a much higher percentage binding 
score (48%) than its flanking sequence VGGVYLLPR 
(10%). Note that no class II- epitopes were predicted in 
the first 14 amino acids of CORE 17 " 42 . The CORE 17 " 42 
epitope was well conserved across the genotypes (second 
only to NS3 1248 " 1261 , Table 2), but was not predicted to 
bind with HLA-DRB1*01:01, HLA-DRB1*01:02 or HLA- 
DRB1*04:01 and only VGGVYLLPR was predicted to 
bind with HLA-DRB1*07:01 (9%, Table 5). 

The most promiscuous class Il-epitope was also the 
best conserved epitope, NS3 1248 1261 (Table 2), specific- 
ally the region 1252-1260 LVLNPSVAA, bound all 
eight of the alleles tested and was the only epitope to 
bind HLA-DRBl*04:01.The allele HLA-DRB1*15:01 was 
predicted to bind with all but five of the 18 peptides 
output by the program (Table 5). The highest percent- 
age of optimal binding (60%) was predicted between 
peptide LIVYPDLGV within NS5B 2571 " 2590 and the 
HLA-DRB1* 15:01 allele.This immunogenic epitope is 
one of three variants common to genotypes 3 and 5. 

The NS3 1248 " 1261 epitope YKVLVLNPS was well con- 
served among genotypes and bound to three DRB1 55 
alleles (Table 5). Interestingly, the epitope KVLVLNPSV, 
also conserved, bound to two class I- supertypes 
(Table 4). Another epitope that is a class I- and II- binder 
is FNILGGWVA (Table 3 and Table 5, respectively). 

Coverage calculations 

The predicted binding scores of published epitopes 
(Tables 3 and 5) were used to estimate population cover- 
age. Selected programme output (which includes a list of 
the input epitopes) has been supplied as supplementary 
figures where indicated. 

IEDB population coverage The published class I- and 
II- epitopes had coverage of 65.85% (Additional file 2: 
Figure S2) in South African Blacks and 81.36% 
(Additional file 3: Figure S3) in South African Whites. 
Corresponding figures when calculations included only 
the class I- epitopes were 41.76% and 52.70%, respect- 
ively (results not shown). By choosing predominantly 



genotypes 1 and 5a epitopes ("best mix") predicted to be 
immunogenic in South African Blacks, the combined 
class I- and Il-coverage in Blacks improved to 91.87% 
(Additional file 4: Figure S4) while coverage improved to 
94.77% (Additional file 5: Figure S5) in the South African 
Whites. 

Optitope Population Coverage The Optitope candi- 
date epitopes were proposed whether the chosen popu- 
lation was "North American Europeans" or Europe 
(geographical) and results showed coverage of 94.28% 
(Additional file 6: Figure S6). Alternatively, candidate 
epitopes were sought using the same HCV alignment 
data and choosing the Zulu ethnic group (the only South 
African ethnic group available in OptiTope) and coverage 
of 75.16% was shown (Additional file 7: Figure S7). 

Optitope Epitopes and IEDB population coverage 

Candidate epitopes chosen for "optimal" vaccines for 
Caucasians and Zulus, respectively, from the OptiTope 
analyses described above, were then tested using the 
South African white and black populations. Local popu- 
lation data was placed into the IEDB population coverage 
web application as before. 

Results indicated that South African Blacks had a 
72.64% chance of responding to a putative European "op- 
timal" vaccine while the same vaccine provided 90.55% 
coverage in the population for which it was designed. 
The putative "optimal" vaccine for Zulus provided cover- 
age of 73.72% in South African Blacks with 90.79% cover- 
age in Europeans (summarized in Additional file 8: 
Figure S8). 

Discussion 

HCV genotypes and host genetics vary geographically 
and yet proposed epitope vaccines are most often for- 
mulated based on genotype 1 peptide sequence data 
alone and their restriction confined to the alleles found 
predominantly in the Caucasian population. This study 
assesses the efficacy of a putative epitope vaccine 
designed with this typical sequence bias when used in 
South African populations. The heterogeneity of epitope 
regions proposed for HCV vaccines was explored to- 
gether with their predicted binding, and that of their 
variants, to HLA alleles common in the South Africa 
population. 

There is a need to examine viral variation within 
known epitopes, and assess the prevalence and immuno- 
genicity of the variants for relevant host alleles within 
the target population, before choosing epitopes for in- 
clusion in an epitope vaccine. This study, therefore, fo- 
cused on subtype la, lb and 5a sequences as these were 
found to predominate in South Africa [15]. This is the 
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first time that South African genotype 5a data is being 
compared to well- studied epitope data of other geno- 
types. Genotypes 3 and 4 have also been found in the 
South African population but genotype 2 is rare and, to 
date, genotype 6 has not been identified. In order to im- 
prove the representation of genotype 5a, all available 
sequence data was included in the alignments, includ- 
ing sequences from our own studies and those of [31] 
(Belgium and South Africa) and [32] (France). 

There are numerous epitopes meeting the inclusion 
criteria that could have been chosen for the study but a 
final subset was chosen so that it included well studied 
epitopes considered for multi-epitopic [22], therapeutic 
[21], minigene [25] and DNA polytope [23] vaccines. 
Genotype 1 is a well-studied genotype and considerably 
more sequences were available for the genotype 1 align- 
ments. Class I- and II- epitope sequences of genotype 5a 
were found to be relatively conserved compared to some 
of the other genotypes, notably genotypes 2, 3 and 6. 
Genotype 5 is considered to be a relatively conserved 
genotype as to date, there is only one subtype of geno- 
type 5 (5a), compared to the highly intra-genotypically 
variable genotype 6 that partitions into 22 different sub- 
types, 6a-6v, considerably more than any of the other 
genotypes [33]. 

There have been several studies which show a lack of 
cross-protection across the genotypes [34-36]. With re- 
gard to the NS3 1073 1081 epitope, an extensively studied 
epitope, our study has predicted high and intermediate 
binding of variant sequences to A02 supertype, indicating 
a level of cross-reactivity for this epitope. The consensus 
at the position 2 of NS3 1073 " 1081 was an isoleucine (I). 
The only other common amino acid in this anchor pos- 
ition was Valine (V). Valine was conserved at position 9 
in all but the genotype 5a sequences where approxi- 
mately one third of the sequences had a leucine (L) in 
this position. Despite the fact that substitutions at P2 
were conservative (an I or V for the more favourable L), 
affinity of this epitope was lowered. When alanine ex- 
change peptides were used in in vitro assays [37], sub- 
stitutions at positions 3, 4, 5 and 7 of the published 
NS3 wd-io«i epitope abolished IFN- gamma production. 
Changes at positions 2, 8 and 9 only partially reduced 
production and only positions 1 and 6 had no effect. 
Even single amino acid exchanges at non-anchor sites 
can significantly limit the potential efficacy of a vaccine 
containing only the wild type peptide [37]. 

[36] identified distinct polymorphism profiles of geno- 
types la and 3a non-structural gene sequences. Only 2 
of the 51 polymorphisms, observed to have significant 
HLA association, were common to both genotypes [36]. 
The extent of genetic diversity can result in a distinct 
repertoire of HLA-restricted viral epitopes for different 
genotypes. When we looked at consensus alignments of 



the chosen epitopes, we also observed this phenomenon. 
The consensus at each site of an epitope represents the 
amino acid best adapted to T cell responses across the 
host population [36]. A consequence of this is that es- 
cape of a mutant (driven by the selection pressure of 
dominant HLA alleles within the host population) can 
become the most dominant amino acid. When this hap- 
pens, the polymorphism in the epitope, or negatope, as 
it is now called, is over-represented even in hosts not 
having the allele which drove the escape [36] . 

One of the shortcomings of IEDB ANN is that it 
can only predict binding peptides that are of the 
same length as those in the training set. Hence, the 
server will not pick up binding in longer epitopes if 
this is not specified [38]. However, by using older 
programs, such as SYFPEITHI and BIMAS that use pep- 
tide motifs and amino acid matrix based prediction ([39]; 
Singh and Raghava 200) both of which are popular, 
updated and have relevance [40] we were able to flag the 
longer epitopes and repeat the prediction in IEDB ANN 
for the 10 amino acid epitope. 

Epitopes which are well conserved and show good 
binding affinities to many HLA alleles (promiscuous) are 
the best candidates for in vitro and/or in vivo testing. 
Epitopes like NS4B " are particularly appealing 
since they contain substrings which act as class I- and 
class II- alleles. While in silico planning has been found 
to greatly facilitate peptide design, not all peptides pre- 
dicted in silico are optimally immunogenic in vivo [41] 
and it remains essential to test predicted peptides in vivo 
so as to ascertain that the needed T-cell response is eli- 
cited. Numerous in silico studies have shown the value 
of using prediction programs to assess the efficiency of 
binding of putative epitopes to human alleles [42-45]. 
Also, [46] showed an increase in the use of in silico pre- 
diction studies with an improvement of epitope prediction 
programs available. Of the published epitopes used in this 
study, only 2 class I- (based on binding to >supertypes) 
and 3 class II- (binding to >2 DRB1* alleles) epitopes were 
found to be promiscuous using the prediction programs. 

The NS3 protein is a large protein and has been 
shown to generate effective immune responses, which 
can resolve acute infection. This study looked across the 
NS3 protein to identify possible additional epitopes 
(other than the ones chosen from the published papers) 
that may be good binders to predominant HLA-alleles 
in the South African population. The results of this 
search (Table 4) which we have called, "newly predicted" 
NS3 epitopes were found to be well-conserved and 
bind to more than one HLA class I- allele. Three class I- 
epitope sequences were found to be highly conserved, 
particularly among genotypes 1 and 5, and were pre- 
dicted to be strong binders to two or more supertypes. 
None of these "newly predicted" NS3 epitopes were 
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found on the Los Alamos HCV immunology database 
(http://hcv.lanl.gov/content/immuno/tables/ctl_summary. 
html, accessed 05-09-2012). This exercise illustrates the 
usefulness of in silico studies to identify potential binders 
which will suit the target populations. In vivo studies will 
always be needed to confirm immunogenicity of these 
predicted peptides but this study has shown that in silico 
prediction can consider both host and viral variation, par- 
ticularly in countries like South Africa and Egypt where 
genotypes other than genotype 1 predominate. In silico 
coverage calculations can not only identify promiscuous 
epitopes but also optimise the best cocktail for an effective 
multi-epitope vaccine. A recent in silico study identified 
69 promiscuous HCV class I- and 150 class II- epitopes 
that were predicted to bind to genotype 3a [44]. A string 
of 18 conserved and promiscuous immunodominant 
epitopes spanning 8 HIV-1 proteins produced an effect- 
ive immunogen [47], 23 epitopes were found promiscu- 
ous to MHC class I- and II- within E-coli 536 genome 
[45] and 15 promiscuous epitopes were predicted within 
M. tuberculosis peptide [43] . 

This study focused mainly on A02 -restricted epi- 
topes and promiscuity was poor. However, immunogenic 
epitopes restricted to other alleles have been identified 
[48-50]. Two B alleles, B57 and B27, have been found to 
provide spontaneous control of HCV. Neither of these 
alleles are prevalent in South African Blacks (Paximadis 
et al., 2011) but preliminary investigations on NS5B 
(B*57-restricted) epitope, KSKKTPMGF (genotype la, 
[48]), and genotype 5a variants RSKKTPMAF and 
KSKKIPMAF showed promiscuity to B*58:01, B*15:03 
and A*30:01(data not shown). Indeed, this reiterates the 
need to look at viral variation and promiscuity as this is 
particularly important to vaccine design. 

The following class I- and II-restricted epitopes were 
selected from the original epitope set as likely to provide 
the best vaccine in the South African setting. This was 
based on binding affinities predicted for epitopes 
expected in the local population and binding to several 
supertypes recently recommended for inclusion in a vac- 
cine which is optimal for both White and Black South 
Africans (supertypes Al, A2, B07, B27 and B58; [13]). 

1. 1MS3 1073 ' 1081 both wild type genotype la 
CINGVCWTV and genotype lb CVNGVCWTV 
because they are so well studied and show cross- 
reactivity within variants and across the supertype 
A02. 

2. NS4B 1807 " 1816 (LLFNILGGWV; [22,24,25]) because 
the 10-mer peptide is well conserved (genotypes la, 
lb, 4, 5a) and is immunogenic for both class I- and 
class II- alleles. 

3. NS5B 2422 " 2433 , both the original MSYSWTGAL 
(genotypes la, lb and 4; Table 3; [22]) and the 



genotype 5a variant MSYTWTGAL as they cover the 
supertypes B27 as well as B07 and are also the best 
available B58 candidate in the recommended 
supertype set [13]. 

4. NS5B 2727 ' 2735 genotype 5a variant KLRDCTLLV of 
the published epitope sequence GLQDCTMLV [22] 
as it brings the most prevalent HLA-A allele in the 
Black population (A*30:01) and the most prevalent 
HCV genotype 5a in South Africa into the mix. 

5. The class II-restricted epitopes NS3 1252 ' 1260 
LVLNPSVAA [27] which is conserved in all 
genotypes and also very promiscuous. 

6. NS4B 1809 ' 1817 which overlaps class I-restricted 1807 
(FNILGGWVA; [25]) and is restricted by the 2 HLA- 
DR alleles in the Black population (HLA DRB1*13:01 
and *11:01) and is also promiscuous. 

7. Core class IT epitope VYLLPRRGP (genotypes 
1,2,4,5,6) included as it is the most reactive of the 
class II- epitopes to HLA DRB1*13:01. 

The frequencies of the most common HLA alleles in 
the South African Caucasian and Indian populations 
closely correlate with values from their respective popu- 
lations globally. However, the frequencies of the most 
common HLA-A and -B alleles in the South African 
Black population are both heterogeneous and unique 
and quite distinct even from other Black populations in 
Western and Northern Africa [51]. Many of the well 
studied published and "newly predicted" epitopes 
assessed in this study bound to A*68:02 (supertype A02). 
HLA-A*68:02 was found 2.6x more often in the Black 
population than HLA-A*68:01 (A03 supertype, [13]). 

There is a good correlation between immunogenicity 
and MHC class I- binding affinity [52]. Based on this 
principle, several web-based resources are available 
which can assess the population coverage of putative 
epitope vaccines based on the predicted binding of the 
epitopes and their variants to chosen HLA alleles relevant 
to the population being assessed. The predicted coverage 
of the original well studied class I- and Il-epitopes selected 
for this study to illustrate the drawbacks of a vaccine 
using South African host population frequencies was 
found to be 65.85% and 81.36% for Blacks and Whites, 
respectively (Additional file 8: Figure S8).The OptiTope 
example highlighted the fact that the greater the know- 
ledge of local viral variation and the immunogenicity of 
these variants together with accurate high resolution 
population allele frequencies allows the design of super- 
ior epitope vaccines with much better coverage for more 
groups within the target population. Fine tuning the vac- 
cine by using an optimal cocktail of genotype 1 and 5a 
epitopes raised the coverage of the vaccine to 91.87% and 
94.77%, close to the 100% coverage predicted by [13] in 
their study population. 
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Conclusion 

In light of data generated in this study, epitope-based 
HCV vaccines should contain a mixture of epitope var- 
iants from all of the genotypes as wild-type genotype 1 
response is not guaranteed to cross-protect against var- 
iants, even if the variant is restricted by the same allele. 
In addition the efficacy of a proposed epitope vaccine 
will differ between the major population groups. While 
coverage estimates can be made based on South African 
supertypes, cross-reaction of peptides with all supertype 
members is not universal. Clearly for a set of epitopes to 
elicit a broad and potent immune response in the target 
population, viral variation and population genetics data 
should be factored into the algorithm particularly in the 
light of less-studied variants such a genotype 5a. 

Even where proposed epitopes are conserved, host 
differences will make the vaccine less effective in the 
South African setting. Of the 13 published and well- 
characterised epitopes selected for this analysis (including 
variants from two of these) four class I- and three class II- 
restricted epitopes would be beneficial in a multi-topic 
therapeutic vaccine for genotype 5a infection in our 
population. Hepatitis C genotypes and high resolution 
population data is necessary when planning epitope vac- 
cine design. While in vivo and in vitro studies are needed 
to confirm predicted immunogenic epitopes, in silico 
"reverse immunology" studies provide a sound basis 
with which to screen the many possible candidates. This 
study has shown that with the ease and usefulness of 
web-based sequence- and structure-based prediction ser- 
vers, non-bioinformaticians can predict potential binders, 
without expensive computer hardware and programming 
knowledge. 

Methods 

Epitope sequences 

The literature was searched for known immunogenic 
class I- and II-restricted epitope vaccine candidates. 
All of the open reading frames (ORF), from the core to 
the NS5B protein, yielded putative epitopes and these 
ranged in length from 9 base pairs (bp; [22]) to 683 bp 
[53]. Six class I- and seven class II- epitopes were 
chosen for the analyses (Table 1) based on the following 
criteria: 

1. All were extensively studied immunogenic epitopes 
(as indicated by the number of references in Table 1). 

2. All had been published in the peer reviewed 
literature. 

3. All class I- epitopes had known HLA restriction. 

4. All had been recommended for putative vaccines. 

5. All were from conserved regions of the genome (core 
to NS5 region). 



Alignments of representative reference sequences were 
obtained over the chosen putative epitope regions using 
sequence data from each of the genotypes with the aid 
of pre-aligned and updated amino acid sequence data 
from the International Nucleotide Sequence Database 
Collaboration (INSDC; [54]). 

The total number of sequences, available per epitope re- 
gion, varied in numbers by genotype and region on the 
genome. Genotype 1 (subtypes la and lb) sequences form 
by far the major number of sequences on the database 
ranging from 54% (of the total number of sequences) to 
84% in some regions. In contrast, the little studied geno- 
types, genotype 4 and 5, accounted for only 4 to 24% 
of available sequences, respectively. Genotype 5a is one 
of the major genotypes found in South Africa together 
with genotype 1. Thus, to have this local type adequately 
represented in the data set, we included our own se- 
quence data (25 patients) from the core [GenBank: 
JX571010-JX571031], NS4B [GenBank: JX571032- 
JX571039] and NS5B [GenBank: DQ482799-DQ482824] 
regions of genotype 5a.Care was taken to ensure that all 
our own data, as well as data used from public databases, 
corresponded to one sequence per subject. The study 
was retrospective and approved by the ethics committee 
of the University of the Witwatersrand, Johannesburg, 
South Africa (WITS HREC M051114), and was therefore 
performed in accordance with the ethical standards of 
the 1964 Declaration of Helsinki. PCR and sequencing 
was performed as previously described [15,31]. 

BioEdit (version 7.0; [55]), was used to align all the 
amino acid sequences. The consensus sequence of im- 
munogenic regions, for each of the genotypes, was gener- 
ated using the Web based software package, WebLogo 
(version 2.8.2; http://weblogo.berkeley.edu/logo.cg; 2008- 
09-08). Sequence numbering is according to [56]. 
WebLogo produces a consensus of the input sequences 
output as a series of "letter stacks", each representing a 
single column of the sequence alignment (Additional 
file 1: Figure SI). The height of each letter within the 
stack is proportional to the relative frequency of the rep- 
resentative amino acid at that position in the sequence 
[57]. The Weblogo software incorporates a "small sample 
number" correction, to correct for potential bias. 

The relative conservation of each epitope was calcu- 
lated as a percentage of the number of polymorphic sites 
over the epitope length when compared to the overall 
HCV consensus sequence. The HCV consensus was 
determined by taking the most common amino acid at 
each amino acid site of the 7 respective genotype consen- 
sus sequences (genotypes la, lb, 2, 3, 4, 5a and 6), irre- 
spective of representation in the database. A minimal 
class I-restricted epitope length of 9 nucleotides was used 
for all class I-restricted epitopes. Since class II-restricted 
epitopes are longer and are made up of numerous 
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overlapping regions, the number of amino acids per epi- 
tope varied. The statistical analysis was performed using 
the analysis of variance (ANOVA) tests of significance in 
the Statistica software, version 9.1. 

Common South African HLA alleles 

Initially, a literature search was conducted in order to 
collate available South Africa population HLA-A -B and 
-DR allele frequency data which included relevant data 
stored online in the New allele Frequency Database 
(http://www.allelefrequencies.net 2010-11-30). However, 
much of this data was low resolution with 2 digits. 
Hence, high resolution data [13], which is required for 
the predictions, were used for the study. 

Immunogenicity prediction and population coverage 
calculations 

Two servers (Immune Epitope Database, IEDB (http:// 
tools.immuneepitope.org, [58]) and Propred II, http:// 
www.imtech.res.in/raghava/propred/index.html, [59]) were 
chosen for this study because these were user-friendly, 
easily available online and displayed many of the HLA 
alleles prevalent in SA. To predict binding to HLA classl- 
alleles, the IEDB server was used. The Propred II server 
was used to predict binding to HLA class II- alleles. 

Resources of the immune epitope database (IEDB) 

The IEDB is a manually curated database of experi- 
mentally characterized immune epitopes. Its compan- 
ion site, the IEDB resource, is a collection of tools for 
prediction and analysis of immune epitopes (http:// 
tools.immuneepitope.org/main/jsp/menu.jsp; version 2.0, 
accessed 2009-09-09 to 2011-03-14, [60]). The "Peptide 
Binding to MHC class I- molecules" resource, which 
predicts MHC binding to T cell epitopes, was utilised 
for class I- predictions. Valid input data include pro- 
teins or peptides. The programme splits these into all 
possible overlapping peptides and then predicts their 
binding to each selected MHC allele using the chosen 
prediction method. The sequence-based method, using 
the artificial neural network (ANN) algorithm of [61] 
on the IEDB server was selected for all HLA class I- 
predictions as it is reported to be more reliable than 
earlier matrix algorithms [61]. 

In addition, however, the matrix-based methods, 
ProPred 1 (http://www.imtech.res.in/raghava/propredI/ 
index.html, 2010-11-30, [62]) and SYFPEITHI [39] were 
used in parallel and binding efficiencies of the three 
methods compared. For brevity, only scores for IEDB 
are shown in the result tables and incompatible results 
are discussed where appropriate. ANN uses training data 
from the IEDB to calculate the affinity of a given peptide 
for specific MHC molecules. It calculates binding based 
on the position of each amino acid in the putative 



epitope while taking into account the probability of adja- 
cent amino acids competing for a space in the MHC 
pocket. Predicted binding efficiencies are calculated in 
units of ICsonM (the half-maximal inhibitory concentra- 
tion). IC50 values <50 nM indicate high affinity while 
values >500 but <5000 nM indicate low affinity and values 
in between the two extremes (>50 nM but <500 nM) indi- 
cate intermediate affinity (http://tools.immuneepitope.org/ 
main/jsp/menu.jsp). 

Sequence data in the NS3 region that was available on 
the database was used for the genotype 5 conservation 
score and binding to predominant HLA-alleles in the 
South African context were predicted.The promiscuity of 
"newly predicted" (i.e. other than published epitopes) 
class I-epitopes of the NS3 gene were analysed using the 
IEDB server. An epitope sequence that bound with <500 
ICsonM to more than one HLA class I- allele was consid- 
ered promiscuous. 

ProPred MHC class II- binding prediction 

A structure-based method with a quantitative matrix 
(QM) algorithm on the Propred II server (http://www. 
imtech.res.in/ raghava/ propred/ index.html, 2010-10-20, 
[63]) was used to predict binding of HLA class II- epi- 
topes. This tool uses a linear prediction model which 
scores the binding potential of the query peptide based on 
values stored in allele specific coefficient tables, or quanti- 
tative matrices. Matrices are generated based on experi- 
mental results taking into account the properties of each 
individual amino acid and its position within the epitope. 

The program is useful in locating promiscuous, versus 
allele specific, binding regions in a query peptide se- 
quence. Note that, by comparison to IEDB ANN, a high 
score is indicative of good binding between the relevant 
peptide and the specific HLA allele and vice versa. The 
score represents the percentage binding of the query 
peptide when compared to the highest possible binding 
score for the optimal peptide with the given allele and 
thus reflects the binding characteristics of the query 
peptide. However, there is no clear cut off as with IEDB 
ANN scoring, and actual percentages should not be 
compared between alleles. The stringency threshold of 
the analysis can be set between 1% and 10% where 
the highest stringency guarantees no false positives 
and the lowest stringency guarantees no false negatives. 
The highest stringency was, therefore, used in all 
programme runs to minimize the number of false posi- 
tives and ensure that all binding had significance. 

Population coverage calculations 

Population coverage was calculated by the Popula- 
tion coverage tool on the IEDB server (http://tools. 
immuneepitope.org/tools/population/iedb) for South 
African Whites and Blacks for both the published 
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class I- and II- epitopes and an adapted "best mix" which 
took into account the most prevalent alleles and epitope 
variants in South Africa and their predicted binding. In 
order to assess the efficacy of a vaccine epitope, the IEDB 
resource Tool calculates the fraction of individuals pre- 
dicted to respond to a given set of epitopes with known 
MHC restrictions (http://tools.immuneepitope.org/main/ 
html/analysis_tools.html last accessed 2011-04-20). The 
calculation is based on input HLA genotypic frequencies. 

Recently released web-based software, OptiTope [64], 
looks at viral and host variation in order to customise 
and optimise candidate epitopes to a specific population. 
Since this approach used the same parameters as this 
study, it was decided to compare the coverage of the 
chosen epitopes with the coverage of putative optimal epi- 
tope vaccines generated in OptiTope using similar biases. 
For this reason OptiTope was asked to generate an opti- 
mal epitope vaccine from an alignment of "common" 
HCV sequences in a Caucasian population. This HCV 
sample data (available in OptiTope), while biased, was very 
comprehensive and consisted of an alignment of >100 
sequences from 10 different HCV proteins (Core, El, E2, 
NS2, NS3, NS4A, NS4B, NS5A, NS5B and p7) but only 
included the "common" subtypes la, lb, 2a and 3a. 
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